Superlinear parallelisation of the k-nearest neighbor classifier

نویسنده

  • Antal van den Bosch
چکیده

With m processors available, the k-nearest neighbor classifier can be straightforwardly parallelized with a linear speed increase of factor m. In this paper we introduce two methods that in principle can achieve this aim. The first method splits the test set in m parts, while the other distributes the training set overm sub-classifiers, and merges their m nearest neighbor sets with each classification. For our experiments we use Timbl, an implementation of the k-NN classifier that uses a decision-tree-based data structure for retrieving nearest neigbors. In a range of experiments the first method consistently scales linearly. With the second method we observe cases of both superlinear and sublinear scaling. A high variance in feature weights dampens the effect of the second type of parallelisation, due to the strong weight-based search heuristics already built into Timbl. When feature weights exhibit less variance, superlinear scaling can occur, due to relatively faster nearest neighbor retrieval by sub-classifiers on 1/mth training sets as compared to retrieval by a classifier trained on the full training set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Superlinear Parallelization of k-Nearest Neighbor Retrieval

With m processors available, the k-nearest neighbor classifier can be straightforwardly parallelized with a linear speed increase of factor m. In this paper we introduce two methods that in principle are able to achieve this aim. The first method splits the test set in m parts, while the other distributes the training set over m sub-classifiers, and merges their m nearest neighbor sets with eac...

متن کامل

Comparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)

In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...

متن کامل

Diagnosis of Tempromandibular Disorders Using Local Binary Patterns

Background: Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment.Material and Methods: CBCT images of 66 patients (132 joints) with TMD and 66 normal...

متن کامل

FUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA

Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.  

متن کامل

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics.  The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in  which  the  bandwidth  is varied depending on the location of the sample points. In this paper‎, we  initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007